Search results for "Correlation clustering"

showing 10 items of 28 documents

Quantum clustering in non-spherical data distributions: Finding a suitable number of clusters

2017

Quantum Clustering (QC) provides an alternative approach to clustering algorithms, several of which are based on geometric relationships between data points. Instead, QC makes use of quantum mechanics concepts to find structures (clusters) in data sets by finding the minima of a quantum potential. The starting point of QC is a Parzen estimator with a fixed length scale, which significantly affects the final cluster allocation. This dependence on an adjustable parameter is common to other methods. We propose a framework to find suitable values of the length parameter σ by optimising twin measures of cluster separation and consistency for a given cluster number. This is an extension of the Se…

0301 basic medicineClustering high-dimensional dataMathematical optimizationCognitive NeuroscienceSingle-linkage clusteringCorrelation clustering02 engineering and technologyComputer Science ApplicationsHierarchical clusteringDetermining the number of clusters in a data set03 medical and health sciences030104 developmental biologyArtificial Intelligence0202 electrical engineering electronic engineering information engineeringCluster (physics)020201 artificial intelligence & image processingQACluster analysisAlgorithmk-medians clusteringMathematicsNeurocomputing

researchProduct

CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm

2015

Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…

0301 basic medicineSpeedupComputer scienceCorrelation clusteringParallel computingTheoretical Computer Science03 medical and health sciencesCUDA030104 developmental biologyHardware and ArchitectureCluster analysisAlgorithmSoftwareWard's methodThe International Journal of High Performance Computing Applications

researchProduct

Fast dendrogram-based OTU clustering using sequence embedding

2014

Biodiversity assessment is an important step in a metagenomic processing pipeline. The biodiversity of a microbial metagenome is often estimated by grouping its 16S rRNA reads into operational taxonomic units or OTUs. These metagenomic datasets are typically large and hence require effective yet accurate computational methods for processing.In this paper, we introduce a new hierarchical clustering method called CRiSPy-Embed which aims to produce high-quality clustering results at a low computational cost. We tackle two computational issues of the current OTU hierarchical clustering approach: (1) the compute-intensive sequence alignment operation for building the distance matrix and (2) the …

Brown clusteringCURE data clustering algorithmSingle-linkage clusteringCorrelation clusteringCanopy clustering algorithmData miningBiologyHierarchical clustering of networksCluster analysiscomputer.software_genrecomputerHierarchical clusteringProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

researchProduct

SMART: Unique splitting-while-merging framework for gene clustering

2014

© 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named "splitting merging awareness tactics" (SMART), which does not require any a priori knowledge of either the number …

Clustering algorithmsMicroarrayslcsh:MedicineGene ExpressionBioinformaticscomputer.software_genreCell SignalingData MiningCluster Analysislcsh:ScienceFinite mixture modelOligonucleotide Array Sequence AnalysisPhysicsMultidisciplinarySMART frameworkConstrained clusteringCompetitive learning modelBioassays and Physiological AnalysisMultigene FamilyCanopy clustering algorithmEngineering and TechnologyData miningInformation TechnologyGenomic Signal ProcessingAlgorithmsResearch ArticleSignal TransductionComputer and Information SciencesFuzzy clusteringCorrelation clusteringResearch and Analysis MethodsClusteringMolecular GeneticsCURE data clustering algorithmGeneticsGene RegulationCluster analysista113Gene Expression Profilinglcsh:RBiology and Life SciencesComputational BiologyCell BiologyDetermining the number of clusters in a data setComputingMethodologies_PATTERNRECOGNITIONSplitting-merging awareness tactics (SMART)Signal ProcessingAffinity propagationlcsh:QGene expressionClustering frameworkcomputer

researchProduct

GenClust: A genetic algorithm for clustering gene expression data

2005

Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, …

Clustering high-dimensional dataDNA ComplementaryComputer scienceRand indexCorrelation clusteringOligonucleotidesEvolutionary algorithmlcsh:Computer applications to medicine. Medical informaticscomputer.software_genreBiochemistryPattern Recognition AutomatedBiclusteringOpen Reading FramesStructural BiologyCURE data clustering algorithmConsensus clusteringGenetic algorithmCluster AnalysisCluster analysislcsh:QH301-705.5Molecular BiologyGene expression data Clustering Evolutionary algorithmsOligonucleotide Array Sequence AnalysisModels StatisticalBrown clusteringHeuristicGene Expression ProfilingApplied MathematicsComputational BiologyComputer Science Applicationslcsh:Biology (General)Gene Expression RegulationMutationlcsh:R858-859.7Data miningSequence AlignmentcomputerSoftwareAlgorithmsBMC Bioinformatics

researchProduct

Data Analysis and Bioinformatics

2007

Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.

Clustering high-dimensional dataFuzzy clusteringComputer sciencebusiness.industryCorrelation clusteringConceptual clusteringMachine learningcomputer.software_genreComputingMethodologies_PATTERNRECOGNITIONCURE data clustering algorithmConsensus clusteringCanopy clustering algorithmData miningArtificial intelligenceCluster analysisbusinesscomputer

researchProduct

Distance Functions, Clustering Algorithms and Microarray Data Analysis

2010

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function works best has been investigated, but no final conclusion has been reached. The aim of this extended abstract is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic sepa…

Clustering high-dimensional dataFuzzy clusteringSettore INF/01 - Informaticabusiness.industryCorrelation clusteringMachine learningcomputer.software_genrePearson product-moment correlation coefficientRanking (information retrieval)Euclidean distancesymbols.namesakeClustering distance measuressymbolsArtificial intelligenceData miningbusinessCluster analysiscomputerMathematicsDe facto standard

researchProduct

Structural clustering of millions of molecular graphs

2014

We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…

Clustering high-dimensional dataFuzzy clusteringTheoretical computer sciencek-medoidsComputer scienceSingle-linkage clusteringCorrelation clusteringConstrained clusteringcomputer.software_genreComplete-linkage clusteringGraphHierarchical clusteringComputingMethodologies_PATTERNRECOGNITIONData stream clusteringCURE data clustering algorithmCanopy clustering algorithmFLAME clusteringAffinity propagationData miningCluster analysiscomputerk-medians clusteringClustering coefficientProceedings of the 29th Annual ACM Symposium on Applied Computing

researchProduct

The Three Steps of Clustering In The Post-Genomic Era

2013

This chapter descibes the basic algorithmic components that are involved in clustering, with particular attention to classification of microarray data.

Clustering high-dimensional dataSettore INF/01 - Informaticabusiness.industryCorrelation clusteringPattern recognitioncomputer.software_genreBiclusteringCURE data clustering algorithmClustering Classification Biological Data MiningConsensus clusteringArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics

researchProduct

A Greedy Algorithm for Hierarchical Complete Linkage Clustering

2014

We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of \({\mathcal O}(n^3)\) with a space requirement of \({\mathcal O}(n)\) and one having a running time of \({\mathcal O}(n^2 \log n)\) with a space requirement of Θ(n 2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of \({\mathcal O}(n)\) which is able to cluster one million points in a day on current commodity hardware.

CombinatoricsCURE data clustering algorithmSUBCLUNearest-neighbor chain algorithmCorrelation clusteringSingle-linkage clusteringHierarchical clustering of networksGreedy algorithmComplete-linkage clusteringMathematics

researchProduct